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HYPOTHESIS TEST FOR NORMAL MIXTURE MODELS: 
THE EM APPROACH 

By Jiahua Chen 1 and Pengfei Li 

University of British Columbia and University of Alberta 

Normal mixture distributions are arguably the most important 
mixture models, and also the most technically challenging. The likeli- 
hood function of the normal mixture model is unbounded based on a 
set of random samples, unless an artificial bound is placed on its com- 
ponent variance parameter. Moreover, the model is not strongly iden- 
tifiable so it is hard to differentiate between over dispersion caused 
by the presence of a mixture and that caused by a large variance, and 
it has infinite Fisher information with respect to mixing proportions. 
There has been extensive research on finite normal mixture models, 
but much of it addresses merely consistency of the point estimation 
or useful practical procedures, and many results require undesirable 
restrictions on the parameter space. We show that an EM-test for 
homogeneity is effective at overcoming many challenges in the con- 
text of finite normal mixtures. We find that the limiting distribution 
of the EM-test is a simple function of the 0.5xo + 0.5x? and Xi dis- 
tributions when the mixing variances are equal but unknown and 
the xl when variances are unequal and unknown. Simulations show 
that the limiting distributions approximate the finite sample distri- 
bution satisfactorily. Two genetic examples are used to illustrate the 
application of the EM-test. 

1. Introduction. The class of finite normal mixture models has many ap- 
plications. More than a hundred years ago, Pearson (1894) modeled a set of 
crab observations with a two-component normal mixture distribution. In ge- 
netics, such models are often used for quantitative traits influenced by major 
genes. Roeder (1994) discusses an example in which the blood chemical con- 
centration of interest is influenced by a major gene with additive effects [see 
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Schork, Allison and Thiel (1996)] for additional examples in human genet- 
ics. Normal mixture models are also used to account for heterogeneity in the 
age of onset for male and female schizophrenia patients [Everitt (1996)], and 
used in hematology studies [McLaren (1996)]. They play a fundamental role 
in cluster analysis [Tadesse, Sha and Vannucci (2005) and Raftery and Dean 

(2006) ], and in the study of the false discovery rate [Efron (2004), McLach- 
lan, Bean and Ben-Tovim (2006), Sun and Cai (2007) and Cai, Jin and Low 

(2007) ]. In financial economics they are used for daily stock returns [Kon 
(1984)]. 

Contrary to intuition, of all the finite mixture models, the normal mix- 
ture models have the most undesirable mathematical properties. Their likeli- 
hood functions are unbounded unless the component variances are assumed 
equal or constrained, the Fisher information can be infinite and the strong 
identifiability condition is not satisfied. We demonstrate these points in the 
following example. 

Example 1. Let Xi, . . . ,X n be a random sample from the following 
normal mixture model: 

(1.1) (l-a)N(0 1 ,a 2 ) + aN(9 2 ,a 2 2 ). 

Let f(x,6,a) be the density function of a normal distribution with mean 9 
and variance a 2 . The likelihood function is given by 

n 

(1.2) l n (a,0 1 ,02,<Ti,a 2 )=J2log{(l-a)f(X i ;0 1 ,<T 1 ) + af(X i ;6 2 ,a 2 )}: 



1. Unbounded likelihood function. The log-likelihood function is unbounded 
for any given n because when 9\ = X\ , < a < 1 , it goes to infinity when 
o\ ->0. 

2. Infinite Fisher information. For each given 9i,9 2 , a\ and a 2 , we have 



Sn 



dl n (a,9 1 ,9 2 ,ai,a 2 ) 



da 



a=0 



^XfiXf^cji) V 



Under the homogeneous model in which 0\ = 0, a\ = 1 and a = 0, that 
is, iV(0, 1), the Fisher information 

E{S 2 } = oo, whenever o\>2. 

3. Loss of strong identifiability. It can be seen that 



d 2 f(x;9,a) 



89 2 



df(x;9,a) 



(0,<r>)=(O,l) 



3(<7 2 



9,<t 2 )=(0,1) 



This is in violation of the strong identifiability condition introduced in 
Chen (1995). 
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The above properties of finite normal mixture models are in addition to 
other undesirable properties of general finite mixture models. In Hartigan 
(1985), Liu, Pasarica and Shao (2003) and Liu and Shao (2004), the likeli- 
hood ratio statistic is shown to diverge to infinity as the sample size in- 
creases, which forces most researchers to restrict the mixing parameter (#) 
into some compact space. Without which, Hall and Stewart (2005) find the 
likelihood ratio test can only consistently detect alternative models at dis- 
tance (n _1 log log n) 1 ! 2 rather than at the usual distance n -1 / 2 . The partial 
loss of identifiability, when 6\ = 02, once forced in a technical separate con- 
dition, \6± — 02 1 > £ > [Ghosh and Sen (1985)]. This condition has recently 
been shown to be unnecessary by many authors, for instance, Garel (2005). 

The unbounded likelihood prevents straightforward use of maximum like- 
lihood estimation. Placing an additional constraint on the parameter space 
[e.g., Hathaway (1985)] or adding a penalty function [Chen, Tan and Zhang 
(2008)] to the log likelihood regains the consistency and efficiency of the 
maximum constrained or penalized likelihood estimators. 

The loss of strong identifiability results in a lower best possible rate 
of convergence [Chen (1995) and Chen and Chen (2003)]. Furthermore, 
it invalidates many elegant asymptotic results such as those in 
Dacunha-Castelle and Gassiat (1999), Chen, Chen and Kalbfleisch (2001) 
and Charnigo and Sun (2004). Finite Fisher information is a common hid- 
den condition of these papers, but it did not gain much attention until the 
paper of Li, Chen and Marriott (2008). 

Due to the indisputable importance of finite normal mixture models, de- 
veloping valid and useful statistical procedures is an urgent task, particu- 
larly for the test of homogeneity. Yet the task is challenging for the reasons 
presented. Many existing results used simulated quantiles of the correspond- 
ing statistics [see Wolfe (1971), McLachlan (1987) and Feng and McCulloch 
(1994)]. Without rigorous theory, however, it is difficult to reconcile their 
varying recommendations. 

In this paper, we investigate the application of the EM-test 
[Li, Chen and Marriott (2008)] to finite normal mixture models and show 
that this test provides a most satisfactory and general solution to the prob- 
lem. Interestingly, our asymptotic results do not require any constraints on 
the mean and variance parameters or compactness of the parameter space. 

In Section 2, we present the result for the normal mixture model (1.1) 
when a\ = a 2 = a 2 . The limiting distribution of the EM-test is shown to be a 
simple function of the 0.5^0 + 0.5xi an d the Xi distributions. In Section 3, we 
present the result for the general normal mixture model (1.1). The limiting 
distribution of the EM-test is found to be the \\- Both results are stunningly 
simple and convenient to apply. In both cases, we conduct simulation studies 
and the outcomes are in good agreement with the asymptotic results. In 
Section 4, we give two genetic examples. For convenience of the presentation, 
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the proofs are outlined in the Appendix and included in a technical report 
[Chen and Li (2008)]. 

2. Normal mixture models in the presence of the structural parameter. 

When o\ = oi = a and a is unknown in model (1.1), we call a the structural 
parameter. We are interested in the test of the homogeneity null hypothesis 

H :a(l-a)(6 1 -6 2 ) = 

under this assumption. Without loss of generality, we assume < a < 0.5. 

Because the population variance Var(Xi) is the sum of the component 
variance a 2 and the variance of the mixing distribution a(l — a){6\ — 9%) 2 , 
a 1 is often underestimated by straight likelihood methods. Furthermore, 
most asymptotic results are obtained by approximating the likelihood func- 
tion with some form of quadratic function [Liu and Shao (2003), Marriott 
(2007)]. The approximation is most precise when the fitted a value is away 
from and 1. Based on these considerations, we recommend using the mod- 
ified log likelihood 

pl n (a,6i,9 2 ,cr) = l n (a,9 1 ,e 2 ,cr,a) +p n (a) + p(a) 

with l n (-) given in (1.2). We usually select p n (a) such that it is bounded 
when a is large, but goes to negative infinity as a goes to 0, and p(a) such 
that it is maximized at a = 0.5 and goes to negative infinity as a goes to 
or 1. Concrete recommendations will be given later. 

To construct the EM-test, we first choose a set of aj G (0, 0.5] , j = 1, 2, . . . , J, 

and a positive integer K. For each j = 1, 2, . . . , J, let ap = aj and compute 
W?i . 0« » °? ) = ar § max ^- {af> , 0i , 6 2 , a) . 

01,02,0- 

For i = 1, 2, ... ,n and the current k, we use an E-step to compute 
and then update a and other parameters by an M-step such that 



a 3 



n 

- argmax< I n - ^wff j log(l - a) + ^wff log(a) + p(a) 




and 



9 (, +1)) ^ + i) )£j ( fc+1 ) )=arg 



=1 



2 n 

max w ij ] lo &{f( X i ]O h ,a)}+p n (a) 

2,(7 h=l i=l 



The E-step and the M-step are iterated K — 1 times. 
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For each k and j, we define 

Mi k \a j ) = 2{pl n (af\ef 1 \ef 2 \af ) )- P l n (l/2j J ,a )}, 

where (0 ,a ) = &rgmax 9a pl n (l/2,9,9,a). 
The EM-test statistic is then defined as 

EM™ = max{M^)(a j ) : j = 1, . . . , J}. 

We reject the null hypothesis when EM^^ exceeds some critical value to be 
determined. 

Consider the simplest case where J = K = 1 and a\ = 0.5. In this case, 
the EM-test is the likelihood ratio test against the alternative models with 
known a = 0.5. The removal of one unknown parameter in the model sim- 
plifies the asymptotic property of the (modified) likelihood ratio test, and 
the limiting distribution becomes the 0.5xo + 0.5xi which does not require 
the parameter space of 6 to be compact. The price of this simplicity is a 
loss of efficiency when the data are from an alternative model with a ^ 0.5. 
Choosing J > 1 initial values of a reduces the efficiency loss because the true 
a value can be close to one of the initial values. The EM-iteration updates 
the value of Oj and moves it toward the true a-value while retaining the 
nice asymptotic property. 

Specific choice of initial set of a values is not crucial in general. This is an- 
other benefit of the EM-iteration. The updated a-values from either a = 0.3 
or a = 0.4 are likely very close after two iterations. Hence, we recommend 
{0.1,0.3,0.5}. If some prior information indicates that the potential a value 
under the alternative model is low, then choosing {0.01,0.025,0.05,0.1} can 
improve the power of the test. We do not investigate the potential refine- 
ments further but leave them as a future research project at this stage. 

The idea of the EM-test was introduced by Li, Chen and Marriott (2008) 
for mixture models with a single mixing parameter. Yet finite normal mix- 
ture models do not fit into the general theory and pose specific technical 
challenges. The asymptotic properties of the EM-test will be presented in 
the next section. The recommendation for penalty functions will be given in 
Section 2.2. 

2.1. Asymptotic properties. We study the asymptotic properties of the 
EM-test under the following conditions on the penalty functions p{a) and 

Pn(cr): 

CO. p(a) is a continuous function such that it is maximized at a = 0.5 
and goes to negative infinity as a goes to or 1. 
CI. sup{|p n (a)|:o->0} = o(n). 
C2. The derivative p' n (<j) = o p (n 1//4 ) at any a > 0. 
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We allow p n to be dependent on the data. To ensure that the EM-test 
has the invariant property, we recommend choosing a p n that also satisfies 
the following: 

C3. p n (aa;aX 1 + b, . . . , aX n + b) = p n (a; X 1 , . . . , X n ). 

The following intermediate results reveal some curious properties of the 
finite normal mixture model. 

Theorem 1. Suppose conditions CO, CI, and C2 hold. Under the null 
distribution N(6q,Gq) we have, for j = 1, . .. , J and any k < K, the follow- 
ing: 

(a) i/oj=0.5, then: 

-0 O = cyn-vs), 9 (k) _ 9q = 0p(n -i/8 )> 

af - a 3 = O p (n"V4 )) a (k) _ ffQ = 0p(jl -i/^ 

(b) if < aj < 0.5, then: 

af - a, = O p (n~ 1 /4 )) a f) _ aQ = 0p {n^). 

Note that the convergence rates of (Oji, ^ , °^ ) depend on the choice 

of initial a value, and it singles out a = 0.5. Even when a,\ = 0.5, a\ ^ 0.5 
when k > 1. However, this does not reduce case (a) to case (b) because 
= 0.5 + o p (l) rather than equaling a nonrandom constant a.\ ^ 0.5. 

Theorem 2. Suppose conditions CO, CI, ane? C2 /ioW and a\ = 0.5. 
Then, under the null distribution N(0q,Oq) and for any finite K as n—> oo, 

Pt(EM$P <x)^ F(x - A){0.5 + 0.5F(x)}, 

where F(x) is the cumulative density function (CDF) of the xi an d 

A = 2 max {p(aj) -p(0.5)}. 

Oj^0.5 

To shed some light on the nonconventional results, we reveal some help- 
ful momental relationships. Without loss of generality, assume that under 
the null model 6± = 62 = and a 2 = 1. The EM-test or other likelihood- 
based methods fit the data from the null model with an alternative model 
(1 — a)N(9i,a 2 ) + aN(02, o~ 2 ). Asymptotically, the fit matches the first few 
sample moments. When a = 0.5 is presumed, the first three moments of a 
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homogeneous model and an alternative model can be made identical with 
proper choice of the values of the remaining parameters. Which model fits 
the data better is revealed through the fourth moment, 

E(Xf) = 3-(6f + 9i) < 3. 

Thus, for local alternatives, we may as well test 

H : E(Xf) = 3 versus H a : E(Xf) < 3. 

The parameter of this null hypothesis is on the boundary so that the null 
limiting distribution of M„(0.5) is the 0.5xo + 0.5%f . 

When a = ao £ (0,0.5), the first two moments of the null and alternative 
models can be made identical, but their third moments differ because 

E(Xf) = (l- a )6f + a 9l 

which can take any value in a neighborhood of 0. Thus, for local alternatives, 
we may as well test 

H :E(Xf)=0 versus H a :E(Xf)^0. 

Because the null hypothesis is an interior point, M n (ao) has the asymptotic 
distribution \\ + 2{p(«o) — p(0.5)} in which 2{p(oo) — p(0.5)} is due to the 
penalty. 

Since the sample third and fourth moments are asymptotically orthogo- 
nal, the limiting distribution of the EM-test involves the maximum of two 
independent distributions, the x\ an d the 0.5xo + 0.5%f , and a term caused 
by the penalty p(a). This is the result as in the above theorem. 

The order assessment results in Theorem 1 can be similarly explained. 
If a = 0.5 is presumed, the fitted fourth moment of the mixing distribution 
will be O p (n~ 1 / 2 ) and hence both fitted 9\ and 9 2 are O p (n~ 1 / 8 ). For other 
a values, the fitted third moment is O p (n~ l l 2 ), which implies that the fitted 
6»i and 9 2 are C^n" 1 ^). 

2.2. Simulation results. We demonstrate the precision of the limiting dis- 
tribution via simulation and explore the power properties. Among several ex- 
isting results, the modified likelihood ratio test (MLRT) in Chen and Kalbfleisch 
(2005) is known to have an accurate asymptotic upper bound. Thus, we also 
include this method in our simulation. The likelihood ratio test (LRT) is 
included due to its popularity among researchers and simulated its critical 
values. 

The key idea of the MLRT is to define the modified likelihood function 

as 

l n (a,6i,6 2 ,a) = l n (a,9 1 ,9 2 ,a) +p(a) 



J. CHEN AND P. LI 



and the recommended penalty function is log{4a(l — a)}. The corresponding 
statistic is defined as 

M n = 2 { l n ( a , 9 1 , 9 2 , a ) - I n ( . 5 , 6 , 6 , a ) } , 

where (a, 61,62, a) and (0.5, 6q, 9q, do) maximize l n under the alternative and 
null models, respectively. Unlike that for the EM-test, the limiting distri- 
bution of M n is unknown but is shown to have an upper bound xi when 
6 is confined in a compact space. Chen and Kalbfleisch (2005) show that 
the type I errors of the MLRT with critical values determined by the X2 
distribution are close to the nominal values. 

For the EM-test statistics, we choose the penalty function 

Pn(o-) = ~{sl/a 2 + log(a 2 /s 2 n )}, 

where s£ = n~ l £f =1 pQ - X) 2 with X = n~ l £? =1 X { . 

It can be seen that (a) p n (cr) satisfies conditions C1-C3; (b) it effectively 
places an inverse gamma prior on a 2 ; (c) it allows a closed- form expression 
for (Tj , and (d) it is maximized at a 2 = s^. In fact, even a constant function 
p n (&) satisfies C1-C2. This choice of p n (cr) prevents under estimation of a 2 
and plays a role of higher-order adjustment. 

For the penalty function p(a), we choose p(a) = log(l — |1 — 2a\). We refer 
to Li, Chen and Marriott (2008) for reasons of this choice. The combination 
of p n {&) and pip) results in accurate type I errors for the EM-test. 

We conducted the simulation with two groups of initial values for a: 
(0.1,0.2, 0.3,0.4,0.5) and (0.1,0.3,0.5). We generated 20,000 random sam- 
ples from iV(0, 1) with sample size n (n = 100,200). The simulated null re- 
jection rates are summarized in Table 1. The EM-test and the MLRT both 

(2) 

have accurate type I errors, especially EM n with the three initial values 
(0.1,0.3,0.5) for a. 

We selected four models for power assessment. The parameter settings are 
shown in rows 2-5 of Table 2. The powers of the EM-test, the MLRT and the 
LRT are estimated based on 5,000 repetitions and are presented in Table 3. 
We used the simulated critical values to ensure fairness of the comparison. 
The results show that the EM-test statistics based on three initial values 
have almost the same power as those from five initial values. Combining the 

type I error results and the power comparison results, we recommend the 

(2) 

use of EM n with three initial values (0.1,0.3,0.5) for a. 

The EM-test has higher power when the mixing proportion a is close to 
0.5, while the MLRT statistic performs better when a is close to 0. The 
powers of the LRT and the MLRT are close under all models. However, 
the limiting distribution of the EM-test is obtained without any restrictions 
on the model, while the limiting distribution of the MLRT or of the LRT 
is unknown, and the upper bound result for the MLRT is obtained under 
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Table 1 







Type I errors 


(%) of the 


EM-test 


and the MLRT 






Level 


EM™ 


EM™ 


EM™ 


EM™ 


EM™ 


EM™ 


MLRT 










n= 100 








10% 


8.9 


9.1 


9.2 


9.2 


9.9 


10.2 


10.9 


5% 


4.6 


4.8 


4.8 


4.6 


5.1 


5.3 


5.7 


1% 


0.9 


1.0 


1.0 


0.9 


1.0 


1.1 


1.2 










n = 200 








10% 


9.3 


9.4 


9.5 


9.7 


10.0 


10.3 


9.8 


5% 


4.6 


4.8 


4.8 


4.7 


5.0 


5.1 


5.0 


1% 


1.0 


1.1 


1.1 


0.9 


1.1 


1.1 


1.1 



Results in columns (2, 3, 4) used a = (0.1, 0.2, 0.3, 0.4, 0.5). 
Results in columns (5,6,7) used a = (0.1,0.3,0.5). 



some restrictions. When a is small and some prior information on a value 
is known, the lower efficiency problem of the EM-test can be easily fixed. 
We conducted additional simulation by choosing the set of initial a-values 
{0.1,0.05,0.025,0.01}. In this case, the limiting distribution of the EM-test 
becomes xl + 2{p(0.1) -p(0.5)}. When n = 100, the power comparison be- 
tween the EM-test and the MLRT becomes 71.5% versus 71.0% for model 
III, and 73.1% versus 75% for model IV. Therefore, the EM-test can be re- 
fined in many ways to attain higher efficiency. Naturally, a systematic way 
is preferential and is best left to a future research project. 

The other eight models in Table 2 have unequal variances, which are 
mainly selected for power comparisons in Section 3.3. To examine the im- 



Table 2 

Parameter values of normal mixture models for power assessment 





1 - a 


0i 


6>2 




<T2 


Model I 


0.50 


-1.15 


1.20 


1.00 


1.00 


Model II 


0.25 


-1.15 


1.15 


1.00 


1.00 


Model III 


0.10 


-1.30 


1.30 


1.00 


1.00 


Model IV 


0.05 


-1.55 


1.55 


1.00 


1.00 


Model V 


0.50 








1.20 


0.50 


Model VI 


0.25 








1.15 


0.50 


Model VII 


0.10 








1.40 


0.50 


Model VIII 


0.05 








1.85 


0.50 


Model IX 


0.50 


0.75 


-0.75 


1.20 


0.80 


Model X 


0.25 


0.65 


-0.65 


1.20 


0.80 


Model XI 


0.10 


0.85 


-0.85 


1.20 


0.80 


Model XII 


0.05 


1.15 


-1.15 


1.20 


0.80 
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Table 3 

Powers (%) of the EM-test, the MLRT and the LRT at 5% level 



Model 


EM ( n x) 




EM^ 


EM ( n x) 


em£ } 


EM^ 


MLRT 


LRT 










n = 


100 








1 


53.4 


53.2 


52.8 


53.8 


53.4 


53.4 


45.2 


45.1 


II 


51.8 


51.7 


51.6 


50.3 


50.5 


50.7 


50.7 


51.0 


III 


51.9 


52.2 


52.2 


50.7 


51.3 


51.7 


59.2 


59.2 


IV 


49.5 


51.2 


51.5 


50.7 


51.6 


52.0 


63.1 


64.4 


V 


15.2 


17.0 


17.6 


16.0 


17.8 


18.1 


33.4 


34.1 


IX 


49.4 


49.3 


49.1 


48.1 


48.7 


48.6 


48.3 


48.4 










n = 


200 








1 


85.2 


85.2 


85.1 


85.3 


85.4 


85.3 


80.1 


78.6 


11 


85.0 


84.9 


84.9 


84.7 


84.8 


84.7 


84.3 


83.2 


111 


86.0 


86.1 


86.1 


85.7 


85.8 


85.9 


90.9 


89.8 


IV 


81.4 


82.3 


82.5 


82.5 


83.1 


83.2 


91.1 


90.2 


V 


23.0 


25.0 


25.9 


24.4 


26.0 


26.9 


52.9 


55.3 


IX 


82.3 


82.2 


82.2 


81.8 


82.0 


82.0 


82.1 


81.3 



Results in columns (2,3,4) used a = (0.1, 0.2, 0.3, 0.4, 0.5). 
Results in columns (5,6,7) used a = (0.1,0.3,0.5). 



portance of the equal variance assumption, we applied the current EM-test 
designed for finite normal mixture models in the presence of a structural 
parameter to the data from models V and IX. In some sense, model V is a 
null model because its two component means are equal, while model IX is 
an alternative model because its two component means are unequal. It can 
be seen in Table 3 that the current EM-test has a rightfully low rejection 
rate against model V. This property is not shared by the MLRT. At the 
same time, the current EM-test has good power for detecting model IX. In 
fact, the power is comparable to that of the EM-test designed for finite nor- 
mal mixture models with unequal variances, to be introduced in the next 
section. We conclude that when a\jai is close to 1, the power of the current 
EM-test is not sensitive to the o\ = 02 assumption. 

To explore what happens when a\jai is large, we generated data from 
model IX with o\ reset to 2.4. The current EM-test rejected the null hy- 
pothesis 84% of the time, compared to a 96% rejection rate for the EM-test 
designed for finite mixture models without an equal variance assumption 
when n = 100. We conclude that when the two component variances are 
rather different, the current EM-test should not be used. An EM-test de- 
signed for finite mixture models without an equal variance assumption is 
preferred. 
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3. Normal mixture models in both mean and variance parameters. 

3.1. The EM-test procedure. In this section, we apply the EM-test to the 
test of homogeneity in the general normal mixture model (1.1) where both 
9 and a are mixing parameters. We wish to test 

H :a(l-a) = or (0i,of) = (0 2 ,of). 

Compared to the case where a is a structural parameter, the asymp- 
totic properties of likelihood-based methods become much more challenging 
because of the unbounded log-likelihood and infinite Fisher information. Es- 
pecially because of the latter, there exist few asymptotic results for general 
finite normal mixture models. Interestingly, we find that the EM-test can 
be directly applied and the asymptotic distribution is particularly simple. 
However, its derivation is complex. 

To avoid the problem of unbounded likelihood, adding a penalty becomes 
essential in our approach. We define 

pl n (a,9i,02,o-i,a 2 ) = Z n (a,6>i,# 2 ,o"i,cr 2 ) +p n {^i) + Pnfa) +£>(«)> 

where p n (cr), p(a) are the same as before. 

The EM-test statistic is constructed similarly. We first choose a set of 
ctj G (0, 0.5], j = 1, 2, . . . , J and a positive integer K. For each j = 1, 2, . . . , J, 

let ap = ctj and compute 

(9jl , 9$ , af^ , af^ ) = arg max pl n {ay , 6>i , 9 2 , o\ , a 2 ) . 

01 ,6*2,(71 ,(72 



For % = 1, 2, ... ,n and the current k, we use the E-step to compute 

(k) _ a j !{ X i^j2 >°j ) 



W 



13 (1 - af )/(*,; $\o-f) + af /(A^gUf ) 
and then we use the M-step to update a and other parameters such that 

a i k + l ) _ ar g max J ( n _ ^2 w jj J l°g(l — a) + ^2 w ij l°g( a ) + p(,oi) 



8=1 / 1 = 1 



and 



„(fc+l) n(k+l) (fe+1) 



2 

argmax 

6l,^2,(71,(72 fo—i 



(k) 
'J 

The E-step and the M-step are iterated K — 1 times. 



log{/(*<;0fc.*h)}+PnH) 

i=l 
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For each k and j, we define 

Mi k \ aj )=2{pl n ( a f\0f 1 \ef 2 \af 1 \af 2 ) )-pUl/2j o J o ,a o ,a o )}^ 

where (9o,&o) = argmax^ a pl n (1/2, 9, 9, a, a). The EM-test statistic is then 
defined as 

EM™ = max{M^)(a j ) : j = 1, . . . , J}. 

We reject the null hypothesis when EM^^ exceeds some critical value to be 
determined. 

In terms of statistical procedure, the EM-test for the case of a\ = a\ is 
a special case of a\ / a\. However, the asymptotic distributions and their 
derivations are different. 



3.2. Asymptotic properties. We further require that p n {o-) satisfies CI 
and: 

C4. p' n (a) = Opin 1 ^), for all a > 0. 

C5. p n (o~) < 4(logn) 2 log(er), when a < n" 1 and n is large. 

The following theorems consider the consistency of (c^- , 9^ , 9^ , crj^ , ) 
and give the major result. 



Theorem 3. Suppose conditions CO, CI and C4-C5 hold. Under the 
null distribution N(9o,o~q) we have, for j = 1, . . . , J, h = l,2 and any k < K, 

af^ - aj = o p (l), 9$ - 9 = o p (l) and af^ - a = o p (l). 

Theorem 4. Suppose conditions CO, CI and C4-C5 hold. When a\ = 
0.5, under the null distribution N(9q, ctq) and for any finite K as n — > 00, 

It is a surprise that the EM-test has a simpler limiting distribution when 
applied to a more complex model. We again shed some light on this via some 
moment consideration. 

The test of homogeneity is to compare the fit of the null iV(0, 1) and the 
fit of the full model. The limiting distribution amounts to considering this 
problem when the data are from the null model. By matching the first two 
moments of the full model to the first two sample moments, we roughly 
select a full model such that 



(1 - a)9 x + a9 2 = and (1 - a){9\ + af) + a(9% + = 1. 
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n — 

-1.0 



-0.5 



1 

o.o 

Third moment 



0.5 



1.0 



Fig. 1. The range (area inside the solid line) of {E(Xi), E(Xi)}. 



Let {5\ = Q\ + g\ — \. When the value of a = uq £ (0, 0.5] (say ao = 0.5), the 
third moment and the fourth moment of the full model are 

E(Xf) = 39 1 (3 1 , 

E(Xf) = 3/?? - 29\ + 3. 

It is easy to verify that {E(Xf), E(Xf)} = {0, 3} if and only if the mixture 
model is the homogeneous model. Therefore, we may as well test 

H :{E(Xf),E(Xf)} = {0,3} versus H a :{E(Xf),E(Xf)} ^ {0,3}. 

As shown in Figure 1, {0,3} is an interior point of the parameter space of 
{E(Xf ), E(Xf)}. Therefore, the null limiting distribution of the EM-test is 
the xl- We note that when the observations are from an alternative model, 
the situation is totally different. A test on moments is not equivalent to the 
EM-test. 

3.3. Simulation studies. We demonstrate the precision of the limiting 
distribution and explore the power properties via simulations. In contrast 
to the case where o\ = fr|, the EM-test does not have many competitors. 
Thus, we set up an MLRT method with 

M n = 2< sup pl n (ct, 0i, 2 , (Ti, a 2 ) -pl n (0.5, §o,9o,&o,vo) 

La,f?l,^2,crijO"2 

Although the limiting distribution of M n is not available, we simulate the 
critical values and use the MLRT as an efficiency barometer. 
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Wesuggest using the penalty function p n (a) = —0.25{s n / a 2 + log(a 2 / s n )} , 
which is almost the same as before except for the coefficient because we have 
two penalty terms in this problem. Our simulation shows that this choice 
works well in terms of providing accurate type I errors. We use p(ot) = 
log(l — |1 — 2a\) according to the recommendation of Li, Chen and Marriott 
(2008). 

In the simulations, the type I errors were calculated based on 20,000 
samples from iV(0, 1). As in Section 2.2, we used two groups of initial val- 
ues (0.1,0.2,0.3,0.4,0.5) and (0.1,0.3,0.5) to calculate EM n K) . The simu- 
lation results are summarized in Table 4. The EM-test statistics based on 
(0.1,0.3,0.5) give accurate type I errors. 

The powers of the EM-test and the MLRT for the models in Table 2 are 
calculated based on 5,000 repetitions and presented in Table 5. Since the 
limiting distribution of the MLRT is unavailable and hence is not a viable 
method, the simulated critical values were used for power calculation. The 

(2) (3) 

simulation results show that the EM n and EM n based on three initial 
values (0.1, 0.3, 0.5) for a have almost the same power as the MLRT. Further 
increasing the number of iterations or the number of initial values for a does 
not increase the power of the EM-test statistics. We therefore recommend 

the use of EM n or EMn based on three initial values (0.1,0.3,0.5) for a. 

We note that when a\ = 02, the current EM-test loses some power com- 
pared to the EM-test designed for finite mixture models in the presence of 
a structural parameter if the mixing parameter a is close to 0.5, but it has 
higher power when a is near or 1. Nevertheless, we recommend the use of 
the current EM-test if the equal variance assumption is likely violated. 

4. Genetic applications. 



Table 4 
Type I errors (%) of the EM-test 



Level 


EM™ 


EM™ 


EM™ 


EM™ 


EM™ 


EM™ 








n 


= 100 






10% 


10.8 


10.9 


10.9 


10.5 


10.6 


10.6 


5% 


5.5 


5.5 


5.6 


5.3 


5.4 


5.4 


1% 


1.2 


1.2 


1.2 


1.1 


1.2 


1.2 








11 


= 200 






10% 


10.7 


10.7 


10.7 


10.4 


10.5 


10.5 


5% 


5.4 


5.4 


5.4 


5.1 


5.2 


5.2 


1% 


1.1 


1.1 


1.1 


1.0 


1.0 


1.0 



Results in columns (2,3,4) used a = (0.1, 0.2, 0.3, 0.4, 0.5). 
Results in columns (5,6,7) used a = (0.1,0.3,0.5). 
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Table 5 

Powers (%) of the EM-test and the MLRT at the 5% level 



Model 


EM\ > 


nil ,r(2/ 

EM\ ' 




nil Wl) 

EMn ' 


, ,(2) 

EM\ ' 




MLRT 










n= 100 








1 


44.0 


44.0 


43.9 


44.1 


43.8 


43.8 


44.0 


T T 
11 


4:1 . I 


4/.y 


A *7 O 

4i\8 


/i *7 c: 
4/.0 


^7 r 

4(.0 


A*7 A 
4 ( .4 


J7 n 


T T T 
111 


K K rr 

00.0 


rr r 
00.0 


00.4 


cc rr c 
00.6 


k cr p. 
00.0 


rr rr rr 
00.0 


cr k rr 
00.0 


IV 


56.9 


56.9 


56.8 


57.4 


56.9 


56.8 


56.8 


V 


rr o a 
08.0 


CT Q A 

00.4 


K O A 

08.4 


08.8 


08.8 


ro 7 

08. / 


rr o o 
08.2 


V 1 


ao rr 

63.0 


03.3 


P. o o 

63.3 


(?n 7 

63. 1 


63.0 


03.6 


coo 
63.2 


Vll 


66.8 


66.6 


66.6 


66.9 


66.8 


66.8 


66.6 


lfTTT 

Vlll 


67.3 


67.2 


67.2 


67.4 


67.2 


67.1 


67.4 


T V 

1A 


AO a 
48.9 


AO Q 

48.8 


48. / 


49.1 


AO O 

48.8 


(Q 7 

48. / 


48. ( 


X 


54.6 


54.6 


54.5 


55.0 


54.8 


54.6 


54.3 


XI 


56.5 


56.5 


56.5 


57.0 


56.6 


56.6 


56.3 


XII 


57.1 


57.1 


57.0 


57.4 


57.1 


57.0 


57.0 










n= 200 








I 


78.3 


78.2 


78.2 


78.3 


78.2 


78.2 


78.2 


II 


82.0 


81.9 


81.9 


82.2 


82.1 


82.1 


81.9 


111 


88.6 


88.6 


88.6 


88.8 


88.7 


88.7 


88.5 


IV 


88.7 


88.6 


88.6 


88.9 


88.8 


88.8 


88.5 


V 


90.0 


89.9 


89.9 


90.1 


90.0 


90.0 


89.8 


VI 


91.6 


91.5 


91.5 


91.7 


91.6 


91.6 


91.5 


VII 


91.4 


91.3 


91.3 


91.5 


91.4 


91.4 


91.3 


VIII 


89.6 


89.6 


89.6 


89.6 


89.5 


89.5 


89.7 


IX 


81.7 


81.5 


81.5 


81.9 


81.8 


81.7 


81.4 


X 


88.1 


88.0 


88.0 


88.3 


88.2 


88.1 


87.9 


XI 


86.4 


86.3 


86.3 


86.5 


86.4 


86.4 


86.2 


XII 


87.7 


87.6 


87.6 


87.9 


87.9 


87.8 


87.5 



Results in columns (2,3,4) used a = (0.1, 0.2, 0.3, 0.4, 0.5). 
Results in columns (5,6,7) used a = (0.1,0.3,0.5). 



Example 2. We apply the EM-test to the example discussed in Loisel et 
al. (1994). Due to the potential use for hybrid production, cytoplasmic male 
sterility in plant species is a trait of much scientific and economic interest. To 
efficiently use this character, it is important to find nuclear genes — preferably 
dominant ones — that induce fertility restoration [MacKenzie and Bassett 
(1987)]. Loisel et al. (1994) carried out an experiment for detecting a major 
restoration gene. In this experiment, 150 F2 bean plants were obtained. The 
number of pods with one up to a maximum of ten grains were then counted 
on each F2 plant. Loisel et al. (1994) suggested analyzing the square root of 
the total number of grains for each plant. If a major restoration gene exists, 
the normal mixture model will provide a more suitable fit; otherwise, the 
single normal distribution best fits the data. The histogram of the trans- 
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formed counts is given in Figure 2. It indicates the existence of two modes, 
and an unequal variance normal mixture model is a good choice. 

Based on some genetic background, Loisel et al. (1994) postulated a three- 
component normal mixture model 

(4.1) \N(e u a 2 ) + \N(9 2 ,a 2 ) + \N(9 3 ,a 2 ) 

and tested the null hypothesis that 6\ = 62 = 03- They found that the limiting 
distribution of the LRT statistic is a 50-50 mixture of the Xi an d x\ > an d the 
resulting p-value is 0.002%. We investigated the null rejection rates of the 
LRT under model (4.1) when n = 150 and the critical values were determined 
by a 50-50 mixture of the x\ an d x\ limiting distributions. Based on 40,000 
repetitions, the simulated null rejection rates were 15.6%, 8.8% and 2.2% 
for nominal values of 10%, 5% and 1%. The above p-value may be biased 
toward the liberal side. 

For illustration purposes, we re-analyzed the data with the EM-test under 

model (1.1) with o\ = a\. The p- value of the MLRT calibrated with the x\ 

(2) 

distribution was found to be 1.4%. We found EM\> = 6.827 with three initial 
values (0.1,0.3,0.5) for a, corresponding to the p-value 1.0%. It can be seen 
that the EM-test provides stronger evidence against the null model than the 
MLRT test. 

It appears that the equal variance assumption is not suitable. We con- 
sider the EM-test for a finite normal mixture with unequal variance. We 
found that EM& = 15.966 and EM {2) = 20.590 with three initial values 




5 10 



Fig. 2. The histogram, of the square root of the total number of grains per plant, the fitted 
densities of the normal mixtures in (1.1) (solid line) and in (4-1) (dashed line). 
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(0.1,0.3,0.5) for a, resulting in the p- values 0.03% and 0.003%, respectively. 
Further iteration does not change the p- value much. This result is in line with 
the outcome of Loisel et al. (1994). The modified MLES of (a, #i, #2, 01,(72) 
are (0.175,10.663,2.535,3.203,1.080), confirming that o\ 7^02 and explain- 
ing why the EM-test under the general model gives much stronger evidence 
against the null model. 

Figure 2 shows the fitted density functions of models (1.1) and (4.1). Our 
analysis indicates that a two-component mixture model can fit the data just 
as well as the model suggested by Loisel et al. (1994). The question of which 
model is more appropriate is not the focus of this paper. 

Example 3. The second example considers the data presented in 
Everitt, Landau and Leese (2001); see part (b) of Table 6.2. This data set 
is from a schizophrenia study reported by Levine (1981), who collated the 
results of seven studies on the age of onset of schizophrenia including 99 
females and 152 males. We use the male data to illustrate the use of the 
EM-test. As suggested by Levine (1981), there are two types of schizophre- 
nia in males. The first type is diagnosed at a younger age and is generally 
more severe; the second type is diagnosed later in life. We wish to test the 
existence of the two types of schizophrenia. 

Everitt, Landau and Leese (2001) fitted the 152 observations using a two- 
component normal mixture model, and used the LRT to test the homogene- 
ity. Using the xi distribution for calibration, they found the p- value was less 
than 0.01%. Following Everitt (1996), our analysis is based on logarithmic 
transformed data. Assuming model (1.1) with of = o|, the p- value of the 
MLRT calibrated with the xi distribution is 1.8%, but EM^ = with three 
initial values (0.1,0.3,0.5) for a. 

Removing the o\ = 02 assumption, we find that EM^ = 13.301 and 
EM n = 13.323 with three a initial values (0.1,0.3,0.5) and both p-values 
are 0.1%. The modified MLEs of (a, {J,i,fJ, 2 , 01, o 2 ) are (0.448,1.379,1.319, 
0.192,0.071). Our analysis indicates that there are two subpopulations in 
the population with close mean ages of onset but different variances. This 
also explains why the EM-test designed for finite mixture models in the 
presence of a structural parameter is insignificant. 

Figure 3 contains the histogram and the fitted densities. It can be seen 
that the mixture model with unequal variances fits better. We also computed 
the LRT statistic which equals 15.27 under the unequal variance assumption. 
If it is calibrated with the xi distribution, as suggested by Wolfe (1971), the 
p-value is 0.4%, and if calibrated with the xi distribution, as suggested by 
McLachlan (1987), the p- value is 1.8%. Without a solid theory, it would be 
hard to reconcile these inconsistent outcomes. 
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0,8 



1.0 



1.2 



1.4 



1,6 



Fig. 3. The histogram of log age of onset for male schizophrenics, the fitted densities of 
the single normal model (dashed line) and normal mixture model in (1.1) (solid line). 



To save space, detailed proofs will be given in a technical report [Chen and Li 
(2008)]. We only include an outline in this appendix. 

For both limiting distribution conclusions in Theorems 2 and 4, the first 
key step is to show that any estimator with a bounded away from and 1, 
and with a large likelihood value, is consistent for 6 and a under the null 
model, whether under the equal variance or unequal variance assumption. 

The second key step is to show that after finite number of EM- iterations, 
ol^ keeps a nondeminishing distance from and 1, and the likelihood value 
is large. Hence, the conclusion in the first key step is applicable. 

(k) (k)\ 

With both fitted values (0\j , 0^ ■') in a small neighborhood of #0j the true 
value under the null model, the likelihood function is approximated by a 
quadratic function. This is the third key step. 

When otj = 0.5, under the equal variance model, the quadratic approxi- 
mation leads to the expansion 



APPENDIX: PROOFS 



MW(0.5) 



{(S? = i y»)~} 2 



+ Op(l). 



When Oj 7^ 0.5, the expansion becomes 




p(0.5)}+o p (l). 



Therefore, 



EM™ = max{M W (ay ) , j = 1, 2, . . 



J} 
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max 



+ Op(l). 



We omit the definitions of Ui and Vi but point out that Yd=iUi/\/n an d 
53iLi Vi/y/n are jointly asymptotical bivariate normal and independent. Con- 
sequently, the limiting distribution of EM^ K ^ is given by F(x — A){0.5 + 
0.5F(x)} with F(x) being the CDF of the xi distribution. 

Under the unequal variance assumption, the asymptotic expansion is 
found to be 

Consequently, the limiting distribution of EM^ is the x\- 
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